Data visualization with ggplot2

Day 2 - Introduction to Data Analysis with R

Selina Baldauf

Freie Universität Berlin - Theoretical Ecology

February 28, 2025

A ggplot showcase

Example plots you can create with ggplot

A ggplot showcase

Visualization by Jake Kaupp, code available on Github

A ggplot showcase

Visualizations produced by the BBC News data team

A ggplot showcase

Visualization by Cédric Scherer, code available on Github

Advantages of ggplot

  • Consistent grammar/structure
  • Flexible structure allows you to produce any type of plots
  • Highly customizable appearance (themes)
  • Many extension packages that provide additional plot types, themes, colors, animation, …
    • See here for a list of ggplot extensions
  • Active community that provides help and inspiration
  • Perfect package for exploratory data analysis and beautiful plots

The data

Data set and_vertebrates with measurements of a trout and 2 salamander species in different forest sections.

  • year: observation year
  • section: CC (clear cut forest) or OG (old growth forest)
  • unittype: channel classification (C = Cascade, P = Pool, …)
  • species: Species measured
  • length_1_mm: body length [mm]
  • weight_g: body weight [g]

Coastal giant salamander (terrestrial form)
Andrews Forest Program by Lina DiGregorio via CC-BY from https://andrewsforest.oregonstate.edu
#> Error in library(lterdatasampler): there is no package called 'lterdatasampler'
#> Error: object 'and_vertebrates' not found

The data

Data set and_vertebrates with measurements of a trout and 2 salamander species in different forest sections.

library(lterdatasampler)
#> Error in library(lterdatasampler): there is no package called 'lterdatasampler'
and_vertebrates
#> Error: object 'and_vertebrates' not found

Artwork by Allison Horst

ggplot(data)

The ggplot() function initializes a ggplot object. Every ggplot needs this function.

library(ggplot2)
# or library(tidyverse)

ggplot(data = and_vertebrates)
  • Empty plot because we did not specify the mapping of data variables
#> Error: object 'and_vertebrates' not found

aes(x, y)

The aesthetics define how data variables are mapped plot properties.

ggplot(
  data = and_vertebrates,
  mapping = aes(
    x = length_1_mm,
    y = weight_g
  )
)
  • Map variable length_1_mm to x-axis and weight_g to y-axis
  • Default scales are automatically adapted to range of data
#> Error: object 'and_vertebrates' not found

aes(x, y)

The aesthetics define how data variables are mapped plot properties.

ggplot(
  data = and_vertebrates,
  mapping = aes(
    x = length_1_mm,
    y = weight_g
  )
)

This is the same but shorter:

ggplot(
  and_vertebrates,
  aes(
    x = length_1_mm, 
    y = weight_g)
)

Remember argument matching by position?

#> Error: object 'and_vertebrates' not found

geom_*

geoms define how data points are represented. There are many different geoms to chose from

geom_point

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g
  )
) +
  geom_point()
  • New plot layers added with +
  • Warning that points could not be plotted due to missing values
  • data and aes defined in ggplot call are inherited to all plot layers
#> Error: object 'and_vertebrates' not found

geom_point

ggplot() +
  geom_point(
    data = and_vertebrates,
    aes(
      x = length_1_mm,
      y = weight_g
    )
  )
  • data and aes can also be local to a layer:

Here, it does not make a difference in the result.

#> Error: object 'and_vertebrates' not found

aes(color): mapping color to a variable

Looks like there are two groups of data: Map color of points to a variable by adding it to aesthetics:

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = species
  )
) +
  geom_point()
  • Map the species variable to the color aesthetic of the plot
#> Error: object 'and_vertebrates' not found

aes(size): mapping size to a variable

We can do the same with size:

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    size = species
  )
) +
  geom_point()
#> Error: object 'and_vertebrates' not found

aes(shape): mapping shape to a variable

We can do the same with shape:

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    shape = species
  )
) +
  geom_point()
#> Error: object 'and_vertebrates' not found

Combine color, size and shape

We can also combine these aesthetics and map different variables

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = unittype,
    shape = species,
    size = year
  )
) +
  geom_point()
  • This is a bit too much for this plot, but sometimes can be useful
#> Error: object 'and_vertebrates' not found

Changing the scales of the aesthetics

The scales onto which the aesthetic elements are mapped can be changed.

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = species
  )
) +
  geom_point()
  • Exponential relationship?
  • How does it look like on the log scale?
#> Error: object 'and_vertebrates' not found

scale_x_log10

The scales onto which the aesthetic elements are mapped can be changed.

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = species
  )
) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10()
  • Scales can be changed for all elements of aes:

scale_aes-name_scale-type

In this example we scale the x and the y aesthetic to log10.

#> Error: object 'and_vertebrates' not found

geom_smooth

Add a smoothing line that helps see patterns in the data

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = species
  )
) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm") +
  scale_x_log10() +
  scale_y_log10()
  • With method = "lm", a linear regression line is added
  • All geoms done separately for species because color is defined globally
  • Alpha makes points transparent (0-1)
#> Error: object 'and_vertebrates' not found

geom_boxplot

Compare groups using a boxplot

ggplot(
  and_vertebrates,
  aes(
    x = species,
    y = length_1_mm
  )
) +
  geom_boxplot()
#> Error: object 'and_vertebrates' not found

geom_boxplot

Compare groups using a boxplot

ggplot(
  and_vertebrates,
  aes(
    x = species,
    y = length_1_mm
  )
) +
  geom_boxplot(notch = TRUE)
  • If notches don’t overlap, the medians of the groups are likely different
#> Error: object 'and_vertebrates' not found

geom_boxplot

Map the unittype to the color aesthetic of the boxplot

ggplot(
  and_vertebrates,
  aes(
    x = species,
    y = length_1_mm,
    color = unittype
  )
) +
  geom_boxplot()
#> Error: object 'and_vertebrates' not found

geom_boxplot

Map the unittype to the fill aesthetic of the box

ggplot(
  and_vertebrates,
  aes(
    x = species,
    y = length_1_mm,
    fill = unittype
  )
) +
  geom_boxplot(notch = TRUE)
#> Error: object 'and_vertebrates' not found

geom_histogram

ggplot(
  and_vertebrates,
  aes(
    x = length_1_mm,
    fill = section
  )
) +
  geom_histogram()
  • Careful: By default the histogram is stacked for the different groups!
#> Error: object 'and_vertebrates' not found

geom_histogram

ggplot(
  and_vertebrates,
  aes(
    x = length_1_mm,
    fill = section
  )
) +
  geom_histogram(
    position = "identity",
    alpha = 0.5
  )
  • Change the position of the histogram to "identity", if you don’t want it stacked
  • alpha makes sure that you see overlapping areas
#> Error: object 'and_vertebrates' not found

geom_tile

You can create a simple heatmap with geom_tile

ggplot(
  and_vertebrates,
  aes(
    x = section,
    y = species,
    fill = weight_g
  )
) +
  geom_tile()
  • Here we would have to choose a different color scheme to see differences
#> Error: object 'and_vertebrates' not found

Small multiples with facet_wrap

Split your plots along one variable with facet_wrap

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = species
  )
) +
  geom_point() +
  facet_wrap(~section)
#> Error: object 'and_vertebrates' not found

Small multiples with facet_grid

Split your plots along two variables with facet_grid

ggplot(
  data = and_vertebrates,
  aes(
    x = length_1_mm,
    y = weight_g,
    color = unittype
  )
) +
  geom_point() +
  facet_grid(section ~ species)
  • facet_grid(rows ~ columns)
#> Error: object 'and_vertebrates' not found

Now you

Task 1.1 - 1.2 (45 min)

Exploratory data analysis with the penguin data set

Find the task description here

(Don’t do the “Beautify” task)

Artwork by Allison Horst

Change appearance of points

ggplot(and_vertebrates, aes(
  x = length_1_mm,
  y = weight_g
)) +
  geom_point(
    size = 4,
    shape = 17,
    color = "blue",
    alpha = 0.5
  )

Shape and color codes

#> Error: object 'and_vertebrates' not found

Change color scale

We can also save a plot in a variable

g <- ggplot(and_vertebrates, aes(
  x = length_1_mm,
  y = weight_g,
  color = species
)) +
  geom_point()

g


  • Other plot layers can still be added to g
#> Error: object 'and_vertebrates' not found
#> Error: object 'g' not found

scale_color_viridis_d

Change the colors of the color aesthetic:

g +
  scale_color_viridis_d(
    option = "cividis"
  )


  • The viridis color palette is designed for viewers with common forms of color blindness
  • Different options of viridis color palettes: "magma", "inferno", "plasma", "viridis", "cividis"
#> Error: object 'g' not found

scale_color_manual

We can also manually specify colors:

g +
  scale_color_manual(
    values = c(
      "darkolivegreen4",
      "darkorchid3"
    )
  )
  • Length of color vector has to match number of levels in your aesthetic
  • Specify colors
    • Via their name
    • Via their Hex color codes (use websites to generate your own color palettes, e.g. here)
#> Error: object 'g' not found

Other color scales

You can use the paletteer package to access color scales from many packages.

# install.packages("paletteer")
library(paletteer)
g <- g +
  scale_color_paletteer_d(
    palette = "ggsci::default_uchicago"
  )
g
  • Use scale_color_paletteer_d for discrete and scale_color_paletteer_c for continuous color scales
  • Check out all palettes available here
#> Error in library(paletteer): there is no package called 'paletteer'
#> Error: object 'g' not found
#> Error: object 'g' not found

scale_fill_* vs. scale_color_*

ggplot(
  and_vertebrates,
  aes(
    x = section,
    y = length_1_mm,
    color = unittype)) +
  geom_boxplot() +
  scale_color_paletteer_d(
    palette = "ggsci::default_uchicago"
  )
#> Error: object 'and_vertebrates' not found
ggplot(
  and_vertebrates,
  aes(
    x = section,
    y = length_1_mm,
    fill = unittype)) +
  geom_boxplot() +
  scale_fill_paletteer_d(
    palette = "ggsci::default_uchicago"
  )
#> Error: object 'and_vertebrates' not found

labs: Change axis and legend titles and add plot title

g <- g +
  labs(
    x = "Length [mm]",
    y = "Weight [g]",
    color = "Species",
    title = "Length-Weight relationship",
    subtitle = "There seems to be an exponential relationship",
    caption = "Data from the `lterdatasampler` package"
  )
#> Error: object 'g' not found
g
#> Error: object 'g' not found

labs: Change axis and legend titles and add plot title

theme_*: change appearance

ggplot2 offers many pre-defined themes that we can apply to change the appearance of a plot.

g +
  theme_classic()
#> Error: object 'g' not found
g +
  theme_bw()
#> Error: object 'g' not found

theme_*: change appearance

ggplot2 offers many pre-defined themes that we can apply to change the appearance of a plot.

g +
  theme_minimal()
#> Error: object 'g' not found
g +
  theme_dark()
#> Error: object 'g' not found

theme(): customize theme

You can manually change a theme or even create an entire theme yourself. The elements you can control in the theme are:

  • titles (plot, axis, legend, …)
  • labels
  • background
  • borders
  • grid lines
  • legends

If you want a full list of what you can customize, have a look at

?theme
  • Look here for an overview of the elements that you can change and the corresponding functions

theme(): customize theme

To edit a theme, just add another theme() layer to your plot.

g +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text = element_text(
      face = "bold"
    ),
    plot.background = element_rect(
      fill = "lightgrey",
      color = "darkred"
    )
  )

The basic functioning of theme elements is:

theme(
  element_name = element_function()
)
#> Error: object 'g' not found

theme_set(): set global theme

You can set a global theme that will be applied to all ggplot objects in the current R session.

# Globally set theme_minimal as the default theme
theme_set(theme_minimal())

Add this to the beginning of your script.

You can also specify some defaults, e.g. the text size:

theme_set(theme_minimal(base_size = 16))

This is very practical if you want to achieve a consistent look, e.g. for a scientific journal.

ggsave()

A ggplot object can be saved on disk in different formats.

Without specifications:

# save plot g in img as my_plot.pdf
ggsave(filename = "img/my_plot.pdf", plot = g)
# save plot g in img as my_plot.png
ggsave(filename = "img/my_plot.png", plot = g)

Or with specifications:

# save a plot named g in the img directory under the name my_plot.png
# with width 16 cm and height 9 cm
ggsave(
  filename = "img/my_plot.png",
  plot = g,
  width = 16,
  heigth = 9,
  units = "cm"
)

Have a look at ?ggsave to see all options.

Now you

Task 2 (30 min)

Make your penguin plots more beautiful

Find the task description here